DEPARTMENT:
|
|
Computer
Science
|
SUBJECT CODE/ COURSE TITLE: |
|
CS 325/CIT 348 [Data Mining] |
CLASS HOURS: |
|
4 Hours per week |
CREDITS: |
|
4 |
PREREQUISTE: |
|
|
TEXTBOOKS: |
|
Introduction Data Mining [ISBN: 0321321367] P. Tan, M. Steinbach, & V. Kumar Pearson Prentice Hall/ 2006 |
REFERENCE: |
|
Data Mining: Introductory and Advanced Topics [0130888923] M. Dunham/Pearson Prentice Hall/ 2003 Internet; Journals |
SEMESTER: |
|
Spring 2011 |
Instructors: |
|
Dr. A. Joseph and Dr. J. Lawler |
Course
Description: This course will provide an overview of topics such
as data mining and knowledge discovery; data mining with structured and
unstructured data; foundations of pattern clustering; clustering paradigms;
clustering for data mining; data mining using neural networks and genetic
algorithms; fast discovery of association rules; applications of data mining to
pattern classification; and feature selection. The goal of this course is to
introduce students to current machine learning and related data mining methods.
It is intended to provide enough background to allow students to apply machine
and data mining techniques to learning problems in a variety of application
areas.
Professor:
|
|
Dr. A. Joseph
|
Office:
|
|
|
Telephone: |
|
212 346 1492 |
Email: |
|
|
Office Hours: |
|
Monday (NYC) 9:00am – 2:00pm |
Final examination:
|
|
35%
|
In-class examinations (6 -- 20 minutes exams): |
|
30% [best 5
of 6] |
Homework: |
|
5% |
Student participation and contribution: Coordinator: Journal: |
|
15% 5% 10% |
Project and project presentation: |
|
15% (3% for
presentation) |
|
|
|
Extra credit assignment (Optional): Note: Only for students who are otherwise
fulfilling all the course requirements. |
|
10% (Due week 12 and no later) |
90% -- 100%
|
|
A
|
85% -- 89% |
|
B+ |
82% -- 84% |
|
B |
80% -- 81% |
|
B- |
75% --79% |
|
C+ |
70% -- 74% |
|
C |
65% -- 69% |
|
D+ |
60% -- 64% |
|
D |
Below 60% |
|
F |
Note: Grade is
computed to the nearest whole number. |
Learning Objectives and Outcomes
Students are expected to accomplish the following learning
objectives and attained the corresponding outcomes by the end of the course.
Objective #1
Students will develop
an intimate understanding of data and their characteristics.
Outcomes
a. Demonstrate a clear understanding and knowledge of the
complexity and possible solutions to the problem of data collection and data
organization capabilities and the available expertise to analyze the data.
b. Know when to determine and prepare data for quality
analysis and its importance to informed decision making as well as be able to
identify and clearly explain at least six indicators of data quality.
c.
Able to define
and discuss a global definition of data warehouse as well as know the
categories of the data it contains and the main transformation methods use to
prepare them.
d. Understand and know different ways in which data are
characterized as well as how to identify and preprocess them.
e.
Able to
demonstrate deep knowledge and understanding of data similarity and
dissimilarity with regard to the operations involved and data analysis.
Objective #2
Students will develop
a sound knowledge and understanding of the data preparation and exploration.
Outcomes
a. Able to demonstrate ability to analyze basic
representations and characteristics of raw data, apply different normalization
techniques on numerical attributes, and recognize different techniques for data
preparation.
b. Able to compare different methods for elimination of
missing data as well as compare different methods for outlier detection.
c.
Can apply summary statistics such as mean, median,
and standard deviation to capture important characteristics in data sets.
d. Know and able explain the purpose and significance of
data visualization as well as know the forms, representations, and procedures
of visualization techniques appropriate for a particular application.
e.
Able to identify
the differences in dimensionality reduction based of features and reduction of
value techniques as well as can clearly explain data reduction in the
preprocessing phase.
f.
Show unambiguous
understanding of the basic principles of feature selection and feature
composition tasks.
g.
Demonstrate a
clear understanding of the differences between decision tree and decision rule
representation in a classification model.
h. Able to identify the basic components of an artificial
neural network and its properties and capabilities in such learning tasks as
classification and pattern association.
i.
Able to describe
the main steps of a genetic algorithm with an illustrative example.
Objective #3
Students will improve
their team-building, social, organizational, and collaborative skills through
assignments, team activities, and projects and that they can further develop in
other classes and in their professional careers.
Outcomes
a. Demonstrate an ability to work effectively in teams.
b. Demonstrate the ability for effective verbal and
written communication
c.
Able to
differentiate between the different types of learning teams and can clearly
explain the stages of team development and the characteristics of an effective
team.
d. Know the importance of task, friendship, and
interaction to a team’s performance
e.
Able to demonstrate
a clear understanding of the role and significance of team norms, teamwork
skills, communication; leadership, decision making, and conflict management in
the effective functioning of a team.
Objective #4
Students will develop
foundational knowledge and understanding of the core concepts of data mining inherent
in classification, cluster analysis, and association analysis as well as their examples
of their applications.
Outcomes
a. Show clear understanding by being to describe or
discuss hierarchical (e.g., agglomerative), partitional (e.g., k-means), ROCK, and
ABSCAN algorithms as well as their appropriateness to different data clustering
applications.
b. Able to briefly describe supervised, unsupervised, and
relative cluster evaluation measures as well as to compare and contrast them
c.
Able to
demonstrate using illustrative examples basic knowledge and understanding of
statistical, distance, decision tree, neural networks, rule-based, and support
vector machine algorithms in solving the classification problem.
d. Able to evaluate, compare, and contrast the
performance of two or more classifier models using different techniques.
e.
Demonstrate the
ability to differentiate between and descriptively explain the different types
association analysis related algorithms such as a priori, sampling,
partitioning, parallel, distributed, frequent pattern growth.
f.
Able to compare
and contrast qualitative and quantitative measures of for evaluating the
quality of association patterns
Objective #5
Students will acquire the knowledge, skills, and
expertise needed to design and develop innovative and imitative algorithms for
competitive products, processes, or services in a technology oriented financial
and health informatics related enterprise.
Outcomes
a. Develop skills and expertise in applying the knowledge
of classification, clustering, association algorithms to solve problems
relating to financial and health care services, processes, or products.
b. Demonstrate the needed know -how to design and develop
or modify algorithms for specific data mining applications in finance and
health care.
Objective #6
Students will be
provided with opportunities to increase their knowledge of and exposure to
entrepreneurial skills through course activities, assignments, and interactions
with mentors.
Outcomes
a. Acquire entrepreneurial skills while interacting with
financial, health care, and/or information technology experts for at least 10
hours to determine and execute the project as measured by different reporting
mechanisms.
Tentative
Examination Schedule:
Course Section |
In-class examination Dates |
Project Due date |
Final Examination Date |
CS 325/CIT 348 CRN: 23191/23190 |
2/9, 2/23, 3/9, 3/30, 4/13, & 4/27 |
April 14, 2011 |
May 5, 2011 |
Note 1: In general, the
lessons will highlight inquiry-based lecture-discussion and may include
storytelling. The central focus of the course will be critical thinking and
problem-solving. To get the most out of the course, each student is expected to
study the reading assignments and genuinely attempt each homework problem
before coming to class. The idea is to come to class ready with questions about
and ideas relating to the course materials and associated problems.
Note 2: In the interest
of learning, it is very important to
come to class prepared to learn – do all required assignments. Failure to do so
could diminish your ability to get the most out of each lesson and the class.
Remember that learning is action oriented. That is, it is not enough to come to class to listen to what others have to say.
You should come to class prepared to become involve in all aspects of classroom activities because learning is an active process.
Note 3: It is very
important you read and familiarize yourself with SCSIS Statement of Student Responsibilities (see Blackboard).
TOPICS
Weeks |
Topics
|
Assignments
|
1-2 |
Data: Types of
data; data quality; data preprocessing; measures of similarity and
dissimilarity; large data sets; and data warehouses. |
Readings: chapter 2 Problems: chapter 2/ 1, 3, 6, 7, 9, & 12 |
|
|
|
3-4 |
Data Preparation and Exploration: Raw data
representation, characteristics, and transformation; missing data; summary
statistics; decision trees and rules; data reduction techniques; neural
networks; genetic algorithms; and visualization. |
Reading: chapter 3 & handouts Problem: Chapter 3/ 1, 2, 4, 6, & 17. |
|
|
|
5-7 |
Classification: Introduction;
approach to solve a classification problem; decision tree induction; model
overfitting; evaluating a classifier performance; comparing classifiers; rule
based classifiers; nearest neighbors classifiers; Bayesian classifiers;
neural networks; and support vector machines |
Reading: chapter s 4 & 5 Problems: chapter 4/. 1; Chapter 5/ 1. |
|
|
|
8-10 |
Cluster Analysis: Introduction;
K-means; agglomerative hierarchical clustering; DBSCAN database; cluster
evaluation; and clustering with categorical attributes. |
Reading: chapter 8 Problems: To be assigned. |
|
|
|
11 |
Project Submission and Presentation |
|
|
|
|
12-13 |
Association Analysis: problem definition;
generation and compact representation of frequent itemsets; rule generation;
algorithms (sampling, partitioning, parallel, distributed, & FP-growth
algorithm); & measuring and evaluation of association patterns. |
Reading: Chapter 6 Problems: Chapter 6/ 1 |
13 |
|
|
|
Review for Final Examination |
|
|
|
: |
|
|
|
14 |
Final
Examination. |
|
|
|
|
|
||
Note 1: This course
is structured around freely formed small collaborative groups in a
cooperative learning environment.
Students are encouraged to work together in their respective groups to
form effective and productive teams that share the learning experience within
the context of the course, help each other with learning difficulties, spend
time to get to know each other, and spend time each week to discuss and help
one another with the course work (content and assignments). Each group member is responsible for the
completion and submission of each assignment.
Each group member will be individually graded. |
||
|
||
Note 2: During the
first class session, student background information will be collected to get
a sense of the diversity of student educational background and an assessment
test will be given to determine students’ knowledge of the subject. |
||
|
||
Group project: Students in
small groups of two to four will participate in a project or research and
prepare a report that involves the use of a low level or high-level
programming language. In this project,
students will write a program to determine the solution of a technical
problem, and then demonstrate their knowledge and understanding of how the
program is processed in the typical digital computer system. Assignment of grade to individual students
for group project will be based upon their involvement in the following items:
programming, report writing, proofreading and correction of programming codes
and written report, and combinations of the above. |
||
|
||
Web support: This course
is supported with most or all of the following Blackboard postings: lesson
questions, lessons (PowerPoint), instructions and guidelines pertaining to
the course, computer architecture and related news, group and class
discussions boards, email correspondence about the course, homework
solutions, examination grades, and miscellaneous course related activities
and information. |
||
|
||
Supplementary materials: Handouts in
class or web postings of current events and issues affecting computer
architecture. Some books that may be
helpful for the course will be posted on Blackboard. |
||
|
||
In
class group activity and participation: Students are
recommended to bring to class current newsworthy events in computer
organization/architecture and related news to share with the class. Students will inform the class of the news
events and their significance to computing.
Devote 15-20 minutes to this
activity.
The collaborative groups are designed to function
outside of the classroom.
Collaborative group activities will be reinforced inside the class
during the lessons. Student groups are
encouraged to function cohesively and to participate in class activities. Devote 30-45 minutes of each class period
to collaborative group activities. |
Students are strongly encouraged to download posted
lessons from Blackboard, review them, and should be able to ask intelligent
questions about the material in these lessons. Every effort will be made to present each lesson
using the storytelling format supported with subsequent discussion and
elaboration on the central points of the lesson. The key elements of a story are the following: causality,
conflict, complication, and character. |
The following excerpts about collaborative learning
are from research documents:
·
In the university
environment, educational success and
social adjustments depend primarily on
the availability and effectiveness of developmental academic support systems.
·
Most organized learning occurs in some kind of group group characteristics
and group processes significantly contribute to success or failure in the
classroom and directly effect the quality and quantity of learning within the
group.
·
Group work invariably produces tensions that are
normally absent, unnoticed, or suppressed in traditional classes. Students bring with them a variety of
personality types, cognitive styles, expectations about their own role in the
classroom and their relationship to the teacher, peers, and the subject matter
of the course.
·
Collaborative
learning involves both management and decision-making skills to choose among
competing needs. The problems encountered
with collaboration have management, political, competence, and ethical
dimensions
·
The two key underlying principles of the collaborative
pedagogy are that active student involvement is a more powerful learning tool
than the passive attendance and that students working in groups can make for
more effective learning than students acting alone. The
Favorable outcomes of collaborative learning include greater conceptual
understanding, a heightened ability to apply concepts, and improved
attendance. Moreover, students become responsible for their own
learning is likely to increase their skills for coping with ambiguity,
uncertainty, and continuous change, all of which are characteristics of
contemporary organizations.
Who creates a new activity in the face of risk and
uncertainty for the purpose of achieving success and growth by identifying
opportunities and putting together the required resources to benefit from them?
Creativity is
the ability to develop new ideas and
to discover new ways to of looking at problems and opportunities
Innovation is
the ability to apply creative solutions to those problems and opportunities to
enhance or to enrich people’s lives.
Each group may be viewed as a small business that is
seeking creative and innovative ways to maximize its product, academic
outcome or average group grade. A
satisfactory product is the break-even group average grade of 85%. Groups getting average grades above 85% are
profitable enterprises. |